# Multilingual Caption Generation
Paligemma2 3b Pt 896
PaliGemma 2 is a multimodal vision-language model that combines image and text inputs to generate text outputs. It supports multiple languages and is suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
2,536
22
Paligemma 3b Ft Cococap 224
PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports multi-language input and output and is suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
209
1
Paligemma 3b Pt 896
PaliGemma is a versatile lightweight vision-language model (VLM) that supports image and text inputs and generates text outputs. It has multilingual capabilities.
Image-to-Text
Transformers

P
google
1,788
119
Paligemma 3b Ft Science Qa 224
PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports image and text input and generates text output, suitable for various vision-language tasks.
Text-to-Image
Transformers

P
google
113
1
Featured Recommended AI Models